124 research outputs found

    Accurate determination of node and arc multiplicities in de Bruijn graphs using conditional random fields

    Get PDF
    Background: De Bruijn graphs are key data structures for the analysis of next-generation sequencing data. They efficiently represent the overlap between reads and hence, also the underlying genome sequence. However, sequencing errors and repeated subsequences render the identification of the true underlying sequence difficult. A key step in this process is the inference of the multiplicities of nodes and arcs in the graph. These multiplicities correspond to the number of times eachk-mer (resp.k+1-mer) implied by a node (resp. arc) is present in the genomic sequence. Determining multiplicities thus reveals the repeat structure and presence of sequencing errors. Multiplicities of nodes/arcs in the de Bruijn graph are reflected in their coverage, however, coverage variability and coverage biases render their determination ambiguous. Current methods to determine node/arc multiplicities base their decisions solely on the information in nodes and arcs individually, under-utilising the information present in the sequencing data. Results: To improve the accuracy with which node and arc multiplicities in a de Bruijn graph are inferred, we developed a conditional random field (CRF) model to efficiently combine the coverage information within each node/arc individually with the information of surrounding nodes and arcs. Multiplicities are thus collectively assigned in a more consistent manner. Conclusions: We demonstrate that the CRF model yields significant improvements in accuracy and a more robust expectation-maximisation parameter estimation. Truek-mers can be distinguished from erroneousk-mers with a higher F(1)score than existing methods. A C++11 implementation is available atunder the GNU AGPL v3.0 license

    The role of game rules in architectural design environments

    Get PDF
    'Experimenting' and 'observing' are crucial actions in architectural design thinking. They rely heavily on the representation environment used (e.g. sketching, scale models, sketch tools, CAD tools, etc.) and the 'game rules' at play in these environments. In this brief paper, we study the role of this representation environment in the overall architectural design thinking process. From this brief study, we indicate two design and implementation approaches to implement and design with such game rules in virtual design environments

    Speeding up Martins' algorithm for multiple objective shortest path problems

    Get PDF
    The latest transportation systems require the best routes in a large network with respect to multiple objectives simultaneously to be calculated in a very short time. The label setting algorithm of Martins efficiently finds this set of Pareto optimal paths, but sometimes tends to be slow, especially for large networks such as transportation networks. In this article we investigate a number of speedup measures, resulting in new algorithms. It is shown that the calculation time to find the Pareto optimal set can be reduced considerably. Moreover, it is mathematically proven that these algorithms still produce the Pareto optimal set of paths

    Policy-compliant maximum network flows

    Get PDF
    Computer network administrators are often interested in the maximal bandwidth that can be achieved between two nodes in the network, or how many edges can fail before the network gets disconnected. Classic maximum flow algorithms that solve these problems are well-known. However, in practice, network policies are in effect, severely restricting the flow that can actually be set up. These policies are put into place to conform to service level agreements and optimize network throughput, and can have a large impact on the actual routing of the flows. In this work, we model the problem and define a series of progressively more complex conditions and algorithms that calculate increasingly tighter bounds on the policy-compliant maximum flow using regular expressions and finite state automata. To the best of our knowledge, this is the first time that specific conditions are deduced, which characterize how to calculate policy-compliant maximum flows using classic algorithms on an unmodified network

    Accurate determination of node and arc multiplicities in de bruijn graphs using conditional random fields

    Get PDF
    Background: De Bruijn graphs are key data structures for the analysis of next-generation sequencing data. They efficiently represent the overlap between reads and hence, also the underlying genome sequence. However, sequencing errors and repeated subsequences render the identification of the true underlying sequence difficult. A key step in this process is the inference of the multiplicities of nodes and arcs in the graph. These multiplicities correspond to the number of times eachk-mer (resp.k+1-mer) implied by a node (resp. arc) is present in the genomic sequence. Determining multiplicities thus reveals the repeat structure and presence of sequencing errors. Multiplicities of nodes/arcs in the de Bruijn graph are reflected in their coverage, however, coverage variability and coverage biases render their determination ambiguous. Current methods to determine node/arc multiplicities base their decisions solely on the information in nodes and arcs individually, under-utilising the information present in the sequencing data. Results: To improve the accuracy with which node and arc multiplicities in a de Bruijn graph are inferred, we developed a conditional random field (CRF) model to efficiently combine the coverage information within each node/arc individually with the information of surrounding nodes and arcs. Multiplicities are thus collectively assigned in a more consistent manner. Conclusions: We demonstrate that the CRF model yields significant improvements in accuracy and a more robust expectation-maximisation parameter estimation. Truek-mers can be distinguished from erroneousk-mers with a higher F(1)score than existing methods. A C++11 implementation is available atunder the GNU AGPL v3.0 license

    OMSim : a simulator for optical map data

    Get PDF
    Motivation: The Bionano Genomics platform allows for the optical detection of short sequence patterns in very long DNA molecules (up to 2.5 Mbp). Molecules with overlapping patterns can be assembled to generate a consensus optical map of the entire genome. In turn, these optical maps can be used to validate or improve de novo genome assembly projects or to detect large-scale structural variation in genomes. Simulated optical map data can assist in the development and benchmarking of tools that operate on those data, such as alignment and assembly software. Additionally, it can help to optimize the experimental setup for a genome of interest. Such a simulator is currently not available. Results: We have developed a simulator, OMSim, that produces synthetic optical map data that mimics real Bionano Genomics data. These simulated data have been tested for compatibility with the Bionano Genomics Irys software system and the Irys-scaffolding scripts. OMSim is capable of handling very large genomes (over 30 Gbp) with high throughput and low memory requirements

    Efficiently counting all orbits of graphlets of any order in a graph using autogenerated equations

    Get PDF
    Motivation: Graphlets are a useful tool to determine a graph's small-scale structure. Finding them is exponentially hard with respect to the number of nodes in each graphlet. Therefore, equations can be used to reduce the size of graphlets that need to be enumerated to calculate the number of each graphlet touching each node. Hocevar and Demsar first introduced such equations, which were derived manually, and an algorithm that uses them, but only graphlets with four or five nodes can be counted this way. Results: We present a new algorithm for orbit counting, which is applicable to graphlets of any order. This algorithm uses a tree structure to simplify finding orbits, and stabilizers and symmetry-breaking constraints to ensure correctness. This method gives a significant speedup compared to a brute force counting method and can count orbits beyond the capacity of other available tools
    • …
    corecore